An open source framework for end-to-end
re-forecast studies in GSI-WRF-MET

By Colin Grudzien

With thanks to:

Minghua Zheng, Ivette Hernández Baños, CW3E’s USAF Group and CW3E’s AR-Recon Team for important discussions about the GSI-WRF-MET stack and sharing library implementation details.

With very special thanks to:

Caroline Papadopoulos, Christopher Harrop, CW3E’s West-WRF Team / NRT-Team and CW3E’s Forecast Verification Team for sharing code that formed the basis of these workflows
Center for Western Weather and Water Extremes.

Data assimilation as a workflow process

  • Data assimilation is a notoriously complex operational problem, which involves interdependent steps of:

    • procuring;
    • pre-processing;
    • generating; and
    • post-processing;
  • large volumes of NWP data to perform the task.

  • From Christopher Harrop:

    Workflow Management is a concept that originated in the 1970’s to handle business process management. Workflow management systems were developed to manage complex collections of business processes that need to be carried out in a certain way with complex interdependencies and requirements.
    …scientific workflows are driven by the scientific data that “flows” through them… usually triggered by the availability of some kind of input data, and a task’s result is usually some kind of data that is fed as input to another task in the workflow.
  • The complexity of data assimilation cycling is demonstrated in the following data flow diagram…

Data assimilation in a GSI-WRF-MET Stack

Diagram of data flows in GSI-WRF-cycling.

Data assimilation as a statistical learning problem

  • My training and method of analyzing the data assimilation problem is from the framework of a statistical learning problem.

  • In order to perform my research, I need to run many simulations to study:

    • hyper-parameter sensitivity in the learning problem; and
    • to generate a statistically significant sample size to validate conclusions with, e.g., hypothesis testing and/or Bayesian modelling techniques.
  • My re-forecasting workflows are also non-standard from the perspective of operational forecasting;

    • rather than generating a, e.g., 10-day forecast at every zero hour, I need to run a forecast up to a specific valid time for verification, with varying forecast length to optimize resources.
  • I also perform simulations on multiple HPC platforms with different system architectures, job schedulers and software stacks, so I need to keep my software as portable and system-agnostic as possible.

Data assimilation as an HPC workflow problem

  • These demands in my research have led me to develop an end-to-end data assimilation cycling system in the GSI-WRF-MET stack using the Rocoto Workflow Manager.

    • This builds principally on the NRT and Verification Teams' Rocoto / MET workflows, that provided the basis for the data flows.
    • Christopher Harrop, the creator of Rocoto, shared open source workflow scripts for all WRF steps, and templates for GSI integration, that are used currently at CW3E for its operational NRT products.
    • As a byproduct of research efforts, I have integrated these codes into a unified system for case-study analysis, using a Bash / Python data science stack.
  • This currently includes a user-facing IPython API for Rocoto workflow commands, and plotting results in Matplotlib.

  • The GSI-WRF-Cycling-Template and MET-tools code repositories are licensed for reuse, redistribution and modification under the Apache 2.0 Open Source License.

  • NOTE: there is no documentation (outside of comments) as the current version available is to be replaced with a new version ASAP, built on the Cylc workflow system.

    • Conversion to Cylc is necessary for long-term support, and for developing a unified system for re-forecasting with “equivalent” experiments in GSI-WRF and JEDI-MPAS with verification performed on common metrics in MET.
    • This intends to build on the open source JEDI-MPAS workflow templates developed at NCAR.

Data assimilation in a GSI-WRF-MET Stack

XML file controlling workflow.

Data assimilation in a GSI-WRF-MET Stack

Diagram of data flows in GSI-WRF-cycling.

Data assimilation in a GSI-WRF-MET Stack

Rocoto logs for GSI-WRF-cycling.

Data assimilation in a GSI-WRF-MET Stack

Rocoto logs for GSI-WRF-cycling.